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Study  of  Optimality 
Criteria  in  Design  of  Experiments 


by 

A.  Hedayat 

Department  of  Mathematics 
University  of  Illinois,  Chicago 


l.  PreUs&nary  ♦ 

We  perform  experiments  mainly  to  estimate  or  test  hypo¬ 
theses  about  some  specified  unknown  parameters  of  a  given 
model  efficiently.  Different  considerations  lead  us  to 
different  criteria  for  the  choice  of  the  "best"  design.  Al¬ 
though  Definition  2.1  is  a  response  function  criterion,  most 
criteria  in  design  theory  are  directly  related  to  parameter 
estimation.  Hence  the  information  matrices  play  an  import¬ 
ant  role  and  thus  by  Caratheodory  theorem  we  can  limit  our 
search  to  discrete  designs  which  are  supported  on  sets  con¬ 
sisting  of  finite  number  of  points . 

To  see  how  the  optimality  criteria  in  design  theory  a- 
rose,  we  first  give  an  example  of  the  very  basic  motivation: 
Let  d  be  a  design  and  let  Y  be  the  vector  of  observations 
obtained  under  d.  Assume 

E(JC)  -  Xfi,  Cov(Y)  =  <t2I ,  (1.1) 

where  X  is  an  nyl  vector  of  observations,  X  is  an  n^k 
matrix  with  known  entries  specified  by  d,  £  is  a  kxl  vec¬ 
tor  of  unknown  constants,  and  1  denotes  the  identity  matrix 
of  order  n.  In  many  cases  we  are  only  interested  in  the 
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2. 


subvector  of  _§  .  With  no  loss  of  generality  we  can 

write  $'  =  (f-^  .*  02#)»  where  ^  is  a  vxl  vector, 

1  £  v  *  k.  According  to  the  partition  o'  -  (  e^'  I  e2')  the 
Model  (1.1)  can  be  written  as 

E ( Y )  =  (X-j.  :  X2)  {p.),  Cov(Y)  =  n2I.  (1.1) ' 

02 

The  information  matrix  of  j1  under  d  and  the  Model 
(1.1)'  is  X1'X1-X1'X2(X2'X2)"X2'X1.  We  shall  denote  this 
by  Md-  Note  that  Md  =  X'X  when  v  =  K,i.e.,51  “  Now 
we  consider  four  cases: 

(i)  is  gs.t.lqat,e  sj&h  cajREonefttJS  of  o: 

Assume  X'X  is  nonsingular,  and  suppose  we  want  to 
estimate  each  of  the  individual  parameters .  By  Gauss-Markov 
Theorem,  the  best  linear  unbiased  estimator  (b.l.u.e.)  j»  of 
$  is  given  by: 

§  -  (X'X)_1X'Y  (1.2) 


with 

Cov($)  =  ct2(X'X)_1.  (1.3) 

Let  x.^  be  the  i-th  column  of  X  and  Cj  be  the 
J-th  column  of  X(X'X)_1,  then  from  (1.2)  and  (1.3)  it  follows 
that 

=  c[  Y  (1.4) 


with 
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A  o 

Var(  fli)  =  a  (cjcj.) .  (1-5) 

Since  (X'X)_1(X'X)  =  1^,  we  have  c^Xj  =  where  b^j 

is  the  Kronecker  delta.  Applying  the  Schwarz  inequality, 
we  obtain 

(x|xi)(c£ci)  ^  (x^)2  =»  1  (1.6) 

hence 

Var(@i)  o2/x£x±.  (1.7) 

Usually,  the  experimenter  has  some  amount  of  freedom 
in  the  choice  of  the  k  vectors  x^  If  possible,  we  would 
like  to  select  a  design  which  estimates  each  of  the  parameters 
with  minimum  variance.  Observe  that  the  equality  in  (1.6) 
holds  If  and  only  if  c^  =  cxi  for  a  constant  c,  which  im¬ 
plies  that  X'X  is  a  diagonal  matrix.  Hence,  theoretically 
speaking,  the  "best  design"  Is  when  X'X  is  a  diagonal  matrix 
with  diagonal  entries  as  large  as  possible,  (e.g.,  if  x^j  * 
0,1,  or  -lj  then  x£x^  .£  n,  the  best  design  is  the  one  for 
which  X'X  =  nl^.)  But  such  a  design  does  not  always  exist, 
see  Hedayat  and  Wallis  (1979) •  When  such  designs  do  not  exist, 
the  question  arises  to  how  a  best  design  should  be  defined.  A 
reasonable  approach  is  to  minimize  the  average  variance  of  each* 
of  the  estimated  parameters  or  to  minimize  the  generalized 
variance,  etc. 

(ii)  ia  g&Umlfi  linear  functions  si  Si  2l  s: 

Suppose  we  want  to  estimate  linear  functions  of  ^  in 


1 

the  form  The  b.l.u.e.  of  qi^  is 


with 


Var(qit1)  =  (72qiMd"q1, 

(1.8) 

where  ^ 

•» 

it 

(1.9) 

and  Qd  = 

[x1/-x1/x2(x2/x2)'x.'3  y. 

(1.10) 

while  M" 
d 

is  any  generalized  inverse  of  Md« 

In  choosing  a  design  for  estimating  there  are 

many  criteria.  One  of  them  is  based  on  the  following  in¬ 
equality 


umin  ^ 


ALiS-gl 

*i«l 


^  Umax' 


(1.11) 


where  Umny  and  umin  are  the  maximum  and  the  minimum  (non¬ 
zero)  eigenvalues  of  M~,  respectively.  This  inequality  gives 
a  bound  for  the  variance  of 

Umin  ^1°2  *  Var(qi^i)  £  Umax  q^o2  (1.12) 

( iii)  Jg  te.sl  toP.othssgg  : 

Suppose  in  addition  Y  is  multivariate  normal  and  we 
want  tb  test  ex  -  e2  = • • •= 9y  *  0  (v  ^  k) •  (Assume  Md  is 
nonsingular).  Then  the  usual  P  test  has  a  power  function 
depending  monotonically  (increasing)  on  a  parameter  \  where 

X  =  n"2  Mdfil  (1.15) 


and  thus  by  (1.11)  and  (1.15) 
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^  »i»l  S  X  £  ^  iifx  (1.14) 

0  O 

where  £max  and  ^min  are  tlle  maxitmim  and  the  minimum  eigen¬ 
values  of  M^. 

(iv)  is  construct  confidence  .region : 

Again  assume  Y  is  multivariate  normal  and 
Md  is  nonsingular.  A  1-a  joint  confidence  region  for 
o-^  is  a  solid  ellipsoid: 

(fil“fil)'Md  (fii-Si)  ^  oZv^(v),  if  a2  is  known,  (1.15) 

2  2 
where  y  (v)  is  the  l~a  percentile  of  the  y  distribution 

with  v  degrees  of  freedom.  Or 

(fi1>e1)'Md  (fij.-flj.)  £  vS2FQ(v,n-r),  if  02  is  unknown,  (1.16) 

where  FQ(v,n-r)  is  the  1-a  percentile  of  the  F  distribu¬ 
tion  with  v  and  n-r  degrees  of  freedom,  and 
S2  =  Y'[I-X(X'X)~X'X]Y/(n-r)  is  an  unbiased  estimator  of 

p 

o  (assume  rank  (X'X)  =  r). 

We  observe  that : 

p 

(a)  The  volume  (expected  volume,  if  o  is  unknown)  of  the 
above  ellipsoid  is  proportional  to  the  square  root  of  det 

p 

(b)  The  semi-exes  (expected  semi-axes,  if  o  is  unknown)  of 
the  above  ellipsoid  is  proportional  to  the  square  roots  of  the 
eigenvalues  of  Md^. 

In  Section  2  we  shall  study  some  well-known  optimality 
criteria.  Section  3-7  will  be  some  generalization  of  those  in 


Section  2,  or  some  recent  developments  in  the  determination 
of  optimal  designs .  Throughout  this  paper  we  write  the  op¬ 
timality  criteria  as  a  class  of  convex  nonincreasing  func¬ 
tionals  $  on  the  set  of  information  matrices  rather  than 
the  class  of  convex  nondecreasing  functionals  t  on  the  set 
of  covariance  matrices,  since  the  former  is  more  general  than 
the  latter.  For  instance,  when  the  covariance  matrix  of  in¬ 
terest  is  equal  to  M~  (as  in  (1.13)),  we  have  $(Md)  = 

\|i(M”)  which  is  convex  in  Md  if  $  is  convex  in  M”  but 
not  on  the  other  hand.  The  strict  inclusion  of  one  class  in 
the  other  is  illustrated  by  the  fact  that,  if  X^(Md)  ^... 

^  XV(M^)  are  the  eigenvalues  of  M~,  then  FX|  (Md)  =  Fx£  (Md 

is  convex  in  Md  but  EX^(Md)  is  not  convex  in  M~. 

Notation  used  in  the  rest  of  this  paper  are  listed  below: 

A  =  the  class  of  all  vxv  nonnegative  definite 
matrices . 

flv,o  =  the  class  of  all  v*v  nonnegative  definite 
matrices  with  zero  row  and  column  sums. 

*»  =s  the  class  of  designs  under  consideration. 

f!  =»  (Md,  d  e  flj. 

Also,  let  pdl  ^  jid2  ^  . ..  >  pdv  be  the  eigenvalues  of 

Md.  Note  that  if  C£  Bv,o,  Wdv  =  for  a11  d  e  D*  If 
necessary,  we  let  *  denote  an  approximate  design  (a  proba¬ 
bility  measure  on  the  experimental  space)  and  M*  be  the 
associated  information  matrix. 

To  avoid  messy  expressions,  the  dimensions  of  matrices 
should  be  deduced  from  the  context  if  they  are  not  explicitly 
specified. 


I. 


Assume  c  £  By. 

G-optlmalitv . 

Smith  (1918)  Introduced  a  response  function  criterion 
which  can  be  stated  as  follow: 

Definition  2,1.  A  design  5*  e  a  is  G-optimal  if  and  only 
if 

a  />. 

min  max  var  EY  =  max  Var  *  EY  , 

X£y  5  xey  5  x 

where  EY  is  the  b.l.u.e.  of  EY  and  y  is  the  experimental 
space.  Kiefer  called  it  G-optimal  (for  global  or  minimix), 
since  we  are  minimizing  the  maximum  variance  of  any  predicted 
value  over  the  experimental  space. 

II.  D-optlmalltv. 

Definition  3.2.2.  A  design  d*  €  p  is  D-optimal  if  and 
only  if  is  non-singular  and  min  de^M’1)  =  det(M~*). 

Here,  nD-"  stands  for  determinant.  The  concept  introduced 
and  studied  by  Wald  (194-3)  and  applied  by  Mood  (1946).  This 
criterion  has  many  appealing  properties; 

(l)  under  normality,  if  d*  is  D-optiraal,  d*  minimizes: 

(a)  The  volume  (or  expected  volume,  if  a  is  unknown, 
and  rank  (Md)  is  invariant  under  d)  of  the  smallest  in¬ 
variant  confidence  region  on  e^,^,. .  ,,e  for  any  given  con¬ 
fidence  coefficient. 

(b)  The  generalized  variance  of  the  estimators  of  par¬ 
ameters.  (see  remark  below). 
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(2)  In  the  class  of  approximate  designs,  D-optimality  « 
G-optimality  whenever  v  =  k,i.e.,  e1  =  p. 

(5)  The  design  remains  D-optimal  if  one  changes  the  scale 
of  the  parameters:  Let  p£,p£,...,p^  be  related  to  , 
P2,...,Pv  by  a  non-singular  linear  transformation.  If  d* 
is  D-optimal  for  then  d*  is  also  D-optimal  for 

The  analogue  for  other  criteria  is  false  in  even 
the  simplest  settings. 

Remark:  Suppose  X  =  (X-^X.^  , . . .  ,Xn) '  is  distributed  as  multi¬ 
variate  N(ujY).  The  determinant  of  V  is  called  the  gener¬ 
alized  variance  of  X  as  defined  by  Wilks  (1932). 

In  the  theory  of  linear  regression,  under  normal  assump¬ 
tion,  e1=  (^,$2*  *  *  *,^v)  '  is  distributed  as  N(p1,M~1n2),  so 
the  generalized  variance  of  )  is  equal  to  the  de¬ 

terminant  of  M^n2  which  is  the  product  of  o2v  and  det  M”1 
(Assume  is  non-singular). 

III.  L-ontimalltv . 

Definition  2.3.  A  design  d*  e  p  id  linear  optimal  (L-opti- 

mal)  if  and  only  if  min  L(M~1)  =  L(M~J*)  where  L  is  a  nonneg- 

de 

atlve  linear  functional  on  g. 

One  of  the  most  useful  linear  criteria  of  optimality  is 
A-optimality  defined  when 

L(M‘1)  =  Tr(M^). 

Definition  2  A.  A  design  d*  e  *\  is  A-optimal  if  and  only 

if  M,*  is  non-singular  and  min  Tr(M“1)  =  Tr(M~*).  "a-" 

dep  a  a 


stands  for  average.  In  a  statistical  sense,  if  d*  is  A-opti 


mal,  it  minimizes  the  average  variances  of  8^# •  •  • » 9y • 
This  criterion  was  introduced  and  studied  by  Elfvlng  (1952) 
and  Chernoff  (1953). 


IV.  E-optimalitv . 

Definition  2 .5 .  A  design  d*  e  «  is  E-optimal  if  and  only 
if  min  =  wd*v*  E“°Ptimality  was  first  considered  in 
hypothesis  testing  (Wald  (1945)*  Ehrenfield  (1955)).  "E-" 

stands  for  eigenvalue.  It  has  the  following  properties: 

(1)  In  hypothesis  testing.  Under  the  normality  assumption, 
an  E-optimal  design  maximizes  the  minimum  power  of  the 
associated  F-test  of  size  a  on  the  contour  =  c  for 

every  a  and  c.  (See  (1.14)). 


(2)  In  point  estimation*  An  E-optimal  design  minimizes  the 
maximum  variance  of  the  b.l.u.e.’s  of  the  over 

all  vxl  vectors  q.^  with  q£q^  =  1.  (See  (1.12)). 


(3)  In  interval  estimation*  An  E-optimal  design  minimizes 
the  largest  semi-axis  of  the  (hyper)  ellipsoid  when  normality 
assumptions  are  made  on  the  observations. 

Now  it  seems  natural  to  specify  some  optimality  functional 
§  on  c  and  to  pose  the  problem:  Find  d  to  minimize  $(Md). 
We  call  5  an  optimality  criterion.  The  above  well-known 
criteria  are  then: 


D-optimality :  $D(Md)  =  det^1) 


v 

=  11 

i=l 


-1 

udi 


(2.1) 


L-optimality :  *L(Md)  =  L(Md  )  (2.2) 

A-optlmality:  #A0*d)  =  TrlM'1)  -JcgjJ;  (g.j) 

E-optimality:  SgtM^)  .  u^.  (2.4) 

(2.1),  (2.3)  and  (2.4)  are  regarded  as  infinite  if  is 

singular. 

Note,  in  case  c  c.  *v  Q,  the  definitions  of  D-,A-,E- 
optiraality  are  similar,  one  can  simply  replace  the  index  v 
in  (2.1),  (2.2)  and  (2.4)  by  v-1. 


3.  S -optlmalit.v  and _ ( M.S) -optimality : 

Assume  r.  c  •  When  Tr(Md)  =  r,  udi  =  A  is  a  constant, 

for  all  d  e  *5,  the  D-,  A-,  E- optimalities  are  attained  when 
all  the  udi's  are  equal  (we  call  such  a  design  a  symmetric 

design).  Unfortunately,  symmetric  designs  do  not  always 
exist.  Intuitively,  in  the  absence  of  a  symmetric  design, 
we  may  want  to  believe  that  the  "closest"  design  to  the  hypo¬ 
thetical  symmetric  design  is  a  reasonable  design  to  use.  Shah 
(i960)  proposed  the  Euclidean  distance  between  the  vector  of 
eigenvalues  of  the  designs  as  the. measure  of  distance  between 
the  corresponding  designs.  Thus,  according  to  Shah  (i960)  if 
there  is  no  symmetric  design  in  D,  we  should  use  the  design 
d  which .  minimizes  the  Euclidean  distance  between  (udj_>***# 

Udv)  and  the  vector  of  eigenvalues  of  the  hypothetical  symmetric 
des ign ,  ( a/v , . • . , A/ v),  i.e., 

U  Mdi  -  udi)2/v-|fc  (5,1) 

Clearly,  this  is  only  a  heuristic  approach  with  no  statis- 


II 


tical  justification.  However,  it  has  the  merit  that  when 
Tr(Md)  is  a  constant,  the  minimization  of  (5.1)  is  equiva¬ 
lent  to  that  of  TrM*?  =  r  .  which  is  easier  to  handle. 

a  aij 

Define  5:^  -  [0,  +  oo  ]  such  that 

•<V  =  TrM^  =  I  (3.2) 


Formally,  we  have: 

Definition  5.1.  Suppose  Tr(M^)  =  A  is  a  constant  for  all 
d  e  o.  A  design  d*  e  a  is  called  S-optimal  if  and  only  if 
d*  minimizes  $(Md)  (as  in  5.2)  for  all  d  e 

Motivated  hy  Shah’s  criterion,  Eccleston  and  Hedayat 
(1974)  proposed  a  similar  procedure  in  the  case  when  TrMd 
is  not  a  constant. 

Let  c'  c  C  be  such  that  the  matrices  in  c'  have  max¬ 
imum  trace. 

Definition  5.2 .  A  design  d*  e  fj  is  (M,S)-optimal  if  and 
only  if  Md*  e  c'  and  d*  minimizes  $(Md)  (as  in  (5.2)), 
for  all  d  e  j o',  where  =  (de*ij 

A  geometric  interpretation  of  (M,S) -optimality  can  be 
given  as  follow.  Set 

SA  =  ( (udl> • . .iddv) i  ddi  >  0,  F  udi  =  AJ, 

and 

SAB  =  C(udi,...,udv)s  udi  >  °'  l  udi  =  A;  l  udi  *  Bi* 

Then  SA  is  an  open  simplex  and  SAB  is  part  of  a  (v-2) -di¬ 
mensional  sphere  with  (a/v, . . . ,A/v)  as  the  center  and  the 
quantity 
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2  /  \2  1 

p  «  rr.  Udi  -  udij/vy  as  the  radius,  when 

B  ^  A^/v.  The  procedure  of  finding  an  (M,S) -optimal  de¬ 
sign  is  the  same  as  to  choose  a  simplex  SA  as  far  away  from 
the  origin  as  possible,  and  then  find  a  design  with  the  vec¬ 
tor  of  eigenvalues  on  SA  which  is  closest  to  the  center 
of  the  simplex  in  the  Euclidean  sense. 

In  the  ft  context,  same  arguments  hold  except  re- 
V  f  o 

placeing  v  by  v-1. 

4 .  $p-crit,eri.a . 

In  Keifer  (1974),  the  following  family  of  criteria  was 
introduced.  We  shall  describe  it  in  the  context. 

Let 

VV  -  [?  Tr<MdP>]p 

v  1 

ri  e  Urt?ip 

=  v  i=l  aiJ  ,  0  <  p  <  oo  .  (4.1) 

Definition  4.1.  A  design  d*  e  a  is  $p-optimal  if  and  only 

if  d*  minimizes  $  (M. ),  d  e  fl. 

p  d 

When  C  c  flv,  we  may  restrict  ourself  to  d  with 
Md  nonsingular.  The  following  theorem  will  give  a  connection 


between  D-,  A-,  E- 

criterion 

and  the  f  -criterion. 

Jr 

Jtofiflgsm  (i) 

«l(Md)  -  i  TrfM'1)  .  i 

V  i 

(ii) 

♦0(Md) 

-  lim  f  (M  )  - 

p-0  v 

(  n  _lNiV 

^di )  (4.2) 

(iii) 

♦co<V 

=  lim  *  (M )  = 

P-.06  F 

-1 

udv 

A 
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.Er.P.of :  (i)  is  clear 

v  1 

(ii)  .p(Md)  =■  [i^  u'fT 

i  .  v  -p 

*plV  *  J  log  [i  . 


As  p  tends  to  zero,  the  right  hand  side  goes  to  g  ,  so  by 
applying  L‘ Hospital's  rule,  we  obtain 

i  r  y.  Urtf  los 

lim  log  »  (M.)  =»  11m  v  i  i»l  aX  .  aiJ 


Hence 


Then 


p-0 


»p'"d 


p-0 


k  J'!106  oil 


=  9  l0B1"1  udl 
1 

-IV  " 


lim  *  (M )  =  (  n  u ll) 
p-*0  p  a  Vl=l  ai/ 


-1 


(ill)  Let  .  UdvU^i 


i°e  »p(«d)  “  5  l°8  l?  i;i(o'dioii)P] 
-  5  !°6  [I  «i?  131u£P] 


i  log  i  +  log  u‘l  +  i  log  ^ 


Since 


Hdi  £  1>  for  all  1, 


we  conclude 


v 

0  £  log  ^  e  u'Jj.)  £  log  v. 


Hence 


lim  i  log  (  z  u'di  )  -  0. 
p-oo  *  1=1  ' 


Therefore 


and  consequently 


lim  log  «  (M.) 
P-*oo  v 


l0®  . 


lim  iA  Md) 

p-oo  * 


-1 

udv  * 


£gxg,Ua,r.y  1x1. 

(i)  When  p»l,  fp-criterion  is  equivalent  to  A-opti- 
mality . 

(ii)  When  p  approaches  to  0,  the  limiting  case  of 
*p-criterion  is  equivalent  to  D-optimality . 

(iii)  When  p  approaches  to  »,  the  limiting  case  of 
$p-criterion  is  equivalent  to  E-optimality. 


Remark:  The  $p-criterion  in  the  By  Q 


context  is 


*p(V 


[A  s  ^ 


5*  ynlv.sxs,al  .Optimality. 

In  Keifer  (1975)*  a  strong  optimality  criterion  was  con¬ 
sidered.  Here,  we  restrict  ourself  in  %  G*  (Since  in  rv 
context,  it  is  easier.) 

Definition  'xJ,.  We  say  d*  e  t>  is  a  universally  optimal  de¬ 
sign,  if  d*  minimizes  *(Md),  d  e  «  for  any  0  >  (-» 


satisfying: 


(i)  |  is  convex, 

(ii)  *  (bM)  is  nonincreasing  in  the  (5.1) 
scalar  b  )  0  for  each  M  e  PVj0  * 

(iii)  4  is  invariant  under  each  permu¬ 
tation  of  rows  and  (the  same  on) 
columns . 


Since  -Tr(M)  satisfies  (5.1),  immediately  we  have 
the  following  theorem: 


Theorem  s .  f.  If  d*  €  *  is  universally  optimal,  then  TrMd# 
is  maximum. 


Definition  5.2.  A  matrix  M  is  called  a  completely  symmetric 
(c.s.)  matrix  if  M  =  aly  +  £JJy  where  a,p  are  scalars  and 
I  is  the  Identity  matrix,  Jy  consists  of  all  I's. 

Lemma  s.l.  If  and  Mg  are  two  completely  symmetric 

matrices  in  p  then  there  exists  an  h  such  that  M.-,  =  hM, . 

V  f  U  Cm L 

£XS2QX  ’  Suppose  Mx  »  +  3xJy 

^  =  ^v  +  ^2Jv 

M±  .  1  =  0  «  ai  +  v^  =*  0  for  i  =  1,2, 

SO  ■**  ^  =  l**"  • 

Let  h  =»  • 

Mg  =  “vP2^v  +  ^2*^v  “  +  PXJV)  53  • 


Then 


The  following  theorems  are  simple  tools  in  determining 
such  an  optimal  design. 

Theorem  5.2.  Suppose  c  c  r^q  contains  a  Md# 

(a)  is  c.s. 

(b)  TrMd*  =  max  TrMd. 

d€*j 

Then  d*  is  universally  optimal  in  jtj. 

Proof:  From  Theorem  5.1  it  suffices  to  show  that  $(Md#) 
minimizes  $ (Md )  for  all  $  satisfies  (5.1),  Md  e  c'  where 
C*  c  C  consists  of  the  matrices  which  have  maximum  trace. 

For  any  Md  e  c',  let  TMd  be  obtained  from  Md  by  per¬ 
muting  rows  and  columns  according  to  t»  and  let  Md  a 
T  the  symmetrized  version  of  Md.  By  (5 .1) (a)  and  (c) 

T 

we  have 

$(Md)  ^  Z  $(TMd)  =  *(Md),  (5.5) 

T 

for  any  $  satisfying  (5.1).  Of  course  Md  need  not  be  in 
C,  but  Md  is  c.s.  and  in  Q.  By  Lemma  5.1,  Md  is  of 
the  form  bMd*  for  some  b  ;>  0.  Now  Tr(Md)  =  Tr(Md).  But 
Tr(Md)  =  Tr(Md#)  by  assumption.  This  implies  b  =  1  and 
hence  Md  -  Md* .  By  (5.3),  $(Md#)  =  $(Md)  £  f(Md)  for  all  f 
satisfying  (5.1)  and  Md  e  c'.  Therefore  Md*  is  universally 
optimal. 

Theorem  5.5.  Suppose  an  Md#  satisfying  (5.2)  exists.  Let 
0  "*  (_00»  be  a  function  satisfying  (5*1).  If,  in 

addtion,  $  is  strickly  convex  (and  hence  also  "nonincrease- 


for  which 
(5.2) 
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ing"  in  property  (ii)  is  replaced  by  "decreasing"),  then 
every  f -optimal  d'  has  Md#  -  Md<  .  (i.e.,  d'  is  also 

universally  optimal). 

j  Let  Md,  =  £  ^d/v!*  Since  *  is  strictly  convex,  we 

have 

f(Md,)  <  V.  It  *{TMd/)  =  *(Md#).  (b.4) 

T 

Again  Md,  is  c.s.  and  in  rv^q  ,  this  Implies  that 
Md#  =  bMd^  for  some  b>  0.  Since  Md*  satisfies  (5.2), 
Tr(Md<)  >  Tr(Md,)  which  implies  b  £  1.  But  if  b  <  1 

*(MdJ  >  *(Md#)  =  *(bMdJ  >  f(MdJ.  (5.5) 

This  contradicts  the  assumption  that  d'  is  *-optimal. 
Krom  (5.4)  and  (5*5)  we  can  conclude  that 

b  =  1  and  Md,  =  Md, 

v  -  Md.  • 

And  d'  is  indeed  universally  optimal.  CD 

Let  and  be  two  convex  functions  satisfying 

(5.1).  Suppose  d*  is  ♦^-optimal,  the  following  theorem 
gives  a  sufficient  condition  for  d*  to  be  -optimal. 

JhSfll&ffl  ‘LJt.  If  £  fg  on  C  and  if  *1(Md#)  =»  *2(Md<), 
then  d*  is  ^-optimal  if  d*  is  ^-optimal. 


18. 


Proof;  Assume  d*  Is  ^-optimal,  then  ^  $l^Md^  for 

all  d  e 
By  assumption 


*2(Md*)  =  ^(Md*)  ^  t1(Md)  £  ^(Md). 

Hence  the  result. 

Example  5.1.  A  useful  family  of  criteria  in  the  q  con¬ 
text  is  the  (^-criteria,  for  0  <  p  <  oo,  with  the  limiting 
values 

-  _JL_ 

f0(Md}  =  ^diV_1  and  $oo(Md)  =  ud(-v-i)*  Here 
p  <  q  =»  $  (M,)  ^  $  ( M , )  with  equality  if  and  only  if  all 

p  Q  Q  ^ 

are  equal.  Hence  from  Theorem  5 .4  if  Md*  is  c.s.  and 
d*  is  s  -optimal  =»  d*  is  $  -optimal  for  all  q  >  p. 

Jr  4 

In  the  absence  of  universal  optimality,  some  weaker  opti¬ 
mality  results  which  have  some  useful  statistical  implications 
(for  instance,  include  A-,  E-,  D-criteria  and  all  (^-criteria, 
0  <  p  <  ®)  has  been  discussed  by  Kiefer  (1974). 

Observe  that  (4.1)  and  (4.2)  are  equivalent  to  the  follow¬ 
ing: 


(a) 

$*(Md)  -  7,  udJ  ,  0  <  P  <  oo  ; 

(b) 

$$(Md)  =  -T.  log  udi 

(5.6) 

(c) 

*oo(V  =  wdv  • 

Let 

**(Md)  =  r  f(udl). 

(5.7) 

where  f  is  convex  on  [0,  +  oo).  We  want  to  find  conditions 

* 

under  which  a  design  d  is  $  -optimal. 
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Lemma  5 .2 .  If  f  is  a  convex  function  on  [0,  +oo  ) ,  then 
v- 1 

j= 


^  f(udt)  i  ^  V  «.aJJ)  (5.8) 


for  any  Md  in  b^q,  with  equality  if  all  the  udi's  are 
equal  or  Md  is  c.s. 

Proof :  Let  P  be  the  (v-l)yv  orthonormal  matrix  such  that 

udl  0 


PMdP'  =  A,  = 


0 


ud 


(v-1) 


Augment 

P 

with 

(JL 

V/v' 

1 

cm’" 

*  yv 

and 

call 

the  resulting 

matrix 

P*. 

PM  ,P' 
a 

0 

Ad 

1 

° 

P*M  ,P* 
d 

'  a 

0 

0 

0 

0 

— 

• 

Assume 

P* 

,  and  let 

eij  =  f 

*2 

» 

Then  £  Q1j  "  1  and  -  1  -  ^  • 


J-l 


Also  Md  =  P'AdP  -  =  V?1  e 


Thus 


2L=1 


'dJJ  =  ^  eij  ^di  ' 

v-1 


’vv^I  mdjj) 


^L 


Ssl 

V 


f(^i  il'i  Udl) 


v-1 


f(iTf*  eiJ  udi)- 


lince 
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(i)  M*  is  c.s. 


v-1  N 

(ii)  d»  minimizes  ^  (5>ll) 


then  d*  is  $*-optimal. 


Proof:  Follows  directly  from  (5.10). 


Example :  In  the  case  of  (5.6),  we  obtain, 


(a)  If  M*  is  c.s.  and  minimizes  £  m"^.  =»  d*  is 

a  j  ajj 

$*-optimal . 


(b)  If  M*  is  c.s.  and  maximizes  £  log  ni 

j 

$*-optimal  (i.e.,  it  is  D-optimal) . 


djj 


d*  is 


(c)  If  M*  is  c.s.  and  maximizes  min  m,  ..  =»  d*  is 

d  j  d  j  j 

^-optimal  (i.e.,  it  is  E-optimal) . 

Also,  from  Theorem  5.2, 


(d)  If  M*  is  c.s.  and  maximizes  £  m,  ..  r»  d*  is 

d  j  djj 

$p-optimal,  0  £  p  <£  00  and  more. 


6.  Type  1  and  Type  2  Criteria. 

Cheng  (1978)  refined  Kiefer's  criteria  and  defined  a 

larger  class  of  optimality  criteria  that  include  A-,  E-, 

D-,  all  $  -criteria,  0  <  p  <  00 ,  and  more. 

Again,  let  C  c  Q.  (In  the  p.y  context,  similar 

arguments  hold.)  Let  t  =  max  TrM . . 

d€*  a 

X^sXinit-iftA  SjJt*  a  design  d*  e  *s  satisfies  optimality  criteria 
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v  - 1 

Of  JIZES  1  if  d*  minimizes  $f(Md)  =  £  f(pdi)  where  f 

is  a  real-valued  function  defined  on  [0,t^)  such  that 

a)  f  is  continuous,  strictly  convex,  and  strictly  de¬ 
creasing  on  [0,t  ].  We  include  here  the  possibili- 

ty  that  f£o)  =  lim  f(x)  =  +oo .  (6.1) 

x~Q+ 

b)  f  is  continuously  differentiable  on  (0,t  ),  and 
f'  is  strictly  concave  on  (0,t^),  i.e.,  f'  <  0, 
f"  >  0,  and  fm  <  0  on  (0,t  ). 

Definition  6.2.  A  design  d*  e  satisfies  optimality  criteria 
of  type  2,  if  d*  minimizes  $f(Md)  =  Vv1  f(udi)  where  f  has 
the  same  property  as  in  Definition  6.1.  Except  that  the  strict 
concavity  of  f'  is  replaced  by  strict  convexity,  i.e.,  f'">  0 
on  (0,t  ) . 

Also,  generalized -optima  lit  y  criterion  of  type  1  (i  = 

1.2)  is  defined  to  be  the  pointwise  limit  of  a  sequence  of 
type  i  criteria. 

From  (4.2)  and  (5.6),  the  A-,  D-,  and  $p-criterion  are 
of  typ  1  and  the  E-criterion  is  a  generalized  criterion  of 
type  1  (being  the  limit  of  $  -criteria,  as  p  -  oo ) .  Note 
that  the  A-  and  D-criteria  correspond  to  the  choices  of 
f(x)  =  x"1  and  -log  x  respectively. 

Remarks ;  (i)  There  do  exist  functions  satisfying  the  re¬ 

quirements  for  a  type  2  criterion.  For  example,  let  f(x)  = 
cx^-ax  over  the  Interval  [0,t^]  of  interest,  when  e  >  0, 
a  >  0  and  e  compared  with  a,  is  small. 


l.  _ 
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(ii)  From  Section  4  if  there  is  a  symmetric  design 
which  maximizes  TrM^  over  «,  then  it  is  optimal 
with  respect  to  a  very  general  class  of  criteria  including 
both  generalized  type  1  and  type  2  criteria.  CD 

It  appears  that  most  optimality  criteria  (universal 
optimality  is  an  exception)  which  place  equal  emphasis  on 
all  the  parameters  can  be  formulated  in  terms  of  the  eigen¬ 
values  of  the  information  matrix.  In  Section  7  we  shall 
introduce  another  optimality  criterion  of  the  form 
.,  Ud(v_i))  with  #  Schur  convex  or  convex  symmetric. 

7 .  Schur  optimality. 

The  concept  of  Schur  optimality  was  introduced  by  Magda 
(1979).  To  see  how  it  was  defined,  let  us  recall  the  following: 

Definition  7.1.  A  matrix  with  nonnegative  entries  is  called 
doubly  stochastic  if  the  sum  of  the  entries  is  1  in  every  row 
and  every  column. 

Definition  7 .2 .  Let  I  be  an  interval  on  the  real  line.  A 
function  $:In  -  R  is  called  Schur  convex  (after  Schur  (1923)) 
if 


*  (Sx)  £  $(x) 

for  all  x  e  In  and  every  doubly  stochastic  matrix  S.  A 
Schur  convex  function  is  not  necessarily  convex,  e.g., 
4>(x1,x2)  =  lxi_x2  1  •  Any  Schur  convex  funcion  is  symmetric, 
because  for  any  permutaion  matrix  P  we  have 


i  kfci*  —yn-iiir" 
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$(Px)  ^  *(x)  =  $(P_±Px)  £  <t»(Rx). 

Hence  $>(Px)  =  $(x)  as  desired.  We  have  used  the  fact 
that  a  permutation  matrix  and  its  inverse  are  examples  of 
dpubly  stochastic  matrices. 

While  symmetry  is  a  necessary  condition  to  have  Schur 
convexity  it  is  by  no  means  sufficient.  When  convexity  is 
added  to  symmetry  we  can  insure  Schur  convexity.  This  is  seen 
as  follow: 'By  Birkhoff  (1946)  every  doubly  stochastic  matrix 
S  can  be  written  as  a  convex  sum  of  permutaion  matrices.  Let 
S  =  Ex^Pi,  (T,\±  =  1).  Then 

^convexity  of  _J>(  symmetry  of  <{>(x) 

$(Sx)  =  sUXjPiX)  £  EXjWPjX)  =  SX^ix) 


=  $(x)  and  this  proves  Schur  convexity. 

Assume  c  c  let  I  =  [0,t  J  and  n  be  the  small- 

—  v,u 

est  integer  for  which  ud(n+1)  =  Ud(n+2)  =  •••=  Udv=  0  for  all 
d  e  fi. 

Define  a(Md)  to  be  the  following  vector  in  In: 


a(Mj  = 


(T) 

\  udn  J 


(7.1) 


For  d  e  A  and  any  Schur  convex  function  $  defined  on  I 
and  nonincreasing  in  its  arguments,  set 


*(Md)  =  *(o(Md)). 


(7.2) 
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Schur  optimality  is  now  defined  as  follows: 

Definition  7.S.  A  design  d*  e  p  is  called  Schur-optimal 
if  d*  minimizes  $(Md),  for  all  d  e  a,  and  all  Schur  con¬ 
vex  functions  §  nonincreasing  in  their  arguments. 

Note  that,  if  4>:I  R  is  convex,  then 

n 

?(x)  =  .  F.  x  =  (x^>Xg , . . .  ,xn)  (7.3) 

is  Schur  convex  on  In  because  *(•)  is  symmetric  and  convex. 
From  (5.6)  D-,  A-,  end  all  ^-criteria  defined  so  far  on  the 
eigenvalues  of  the  information  matrices  are  instances  of  Schur 

functions.  As  a  symmetric  and  convex  function  on  I 

E(xi>***>xn)  =  -min  (x. ,Xo >  • » • ,x  J 
1  n  O^i^n  ±  *  n 

is  also  Schur  convex.  This  function  is  associated  with  E-op¬ 
timality.  Note  that  E-optimality  is  no  longer  a  limiting  case 
when  delt  with  as  a  Schur  convex  function.  To  prove  Schur  opti¬ 
mality,  we  state  the  .following  very  useful  tool. 

Theorem  7 .2 .  (derived  from  Ostrowski  (1952))* 

Let  f(x-^,  . . .  >xn)  be  a  Schur  convex  and  nonincreasing 
function  in  its  arguments  on  In.  Let 

yl  *  y2 ynJ  X1  ^  x2  xn  (7*^ 

satisfy  the  following 

y-,  +. . .+  y  ^  x.  +. . .+  x  for  all  1  £  1  £  n.  (7.5) 


Then 


■WTO HBPBWWSWpmWP 
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f(  y-^.» . « •  »y^)  £  f(x^» . . • ,xn) • 


For  convenience,  when  two  vectors  x  and  y  e  In, 
Satisfy  (7. 4)  and  (7.5)  we  write  y  £  x. 

Applying  the  above  result  we  can  immediately  conclude: 


Theorem  ZjJ.  d*  is  Schur  optimal  if  o(M*)  £  o(Md)  for 
all  d  e  a. 

It  should  be  pointed  out  that  the  ordered  partial  sums 
in  (7.4)  are  examples  of  Schur  convex  functions.  Further 
useful  results  can  be  obtained  in  Hardy  and  Littlewood  (1967). 


Lemma  L»2.  Let  Md  c  c  £  qVj0  and  PjL(l  ^  i  ^  n)  be  n 
orthogonal  matrices  such  that  =  Pi"^MdPi  also  satisfies 

M^l  ==  0  for  all  1  £  i  £  n.  Set  Md  =  ^  v  .  Then 

for  any  Schur  convex  function  §  nonincreasing  in  its  argu¬ 
ments  we  have  $(Md)  £  $(Md). 

Proof :  Since  the  P^s  are  orthogonal,  we  have  a(M^1^)  = 
„(M  )  and  hence  $(M^)  -  *(Md)  for  all  1  £  i  £  n.  More¬ 
over,  let  (UdiJ  and  denote  the  eigenvalues  of  Md 

and  Md  respectively  (and  let  them  be  ordered  nonincreasing- 
ly.)  Then  it  is  Known  (see  Bellman  (1970))  that 


3 


{ 

3 

i 


1 


l  l 

T  Hdi  ^  r  Udi  for  l  -  l,2,...,v-l. 
i=l  ai  i=l  ai 

By  Theorem  7.1  we  obtain  $(Md)  £  $(Md). 

Remark;  We  call  Md  (defined  in  Lemma  7.2),  an  averaged 


version  of  Md 


Verifying  the  requirements  of  Theorem  7.3  is  difficult 
because  of  the  large  variety  of  information  matrices  Md« 

It  is  practically  impossible  to  find  o(Md).  When  averaging 
Md  properly,  however,  it  is  easily  seen  that  finding 
n(Md)  is  a  tractable  task.  Hence  comparing  a(Md<)  and 
o(Md)  (in  view  of  Theorem  7.4)  is  often  time  possible. 

Theorem  7.4.  d*  is  Schur  optimal  if  o(Md#)  £  rr(Md  )  for 
all  d  e  «,  where  Md  is  some  average  version  of  Md» 

Proof.  $(Md*)  ^  $(Md)  where  the  first  inequality  holds 
from  the  assumption  a(Md#)  £  rr(Md)  and  the  latter  from 
Lemma  7.2. 

Closing  remarks :  We  refer  the  reader  to  "Special  Issue  on 
Optimal  Design  Theory"  No.  14,  Vol.  A7  (1978)  of  Communications 
in  Statists  (edited  by  this  author)  for  further  ideas,  results 
and  references .  Currently  we  are  preparing  a  booK  on  the 
subject  of  optimal  design  of  experiments.  The  book  should 
be  available  for  distribution  within  a  year  or  so.  Meanwhile, 
the  interested  reader  can  obtain  preliminary  versions  of  some 
chapters  of  the  book. 
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